Skip to content

refactor(session_store): thread base_dir + KeyringMode through API (#34)#43

Merged
hanwencheng merged 3 commits intomainfrom
fix/issue-34-session-store-base-dir
Apr 16, 2026
Merged

refactor(session_store): thread base_dir + KeyringMode through API (#34)#43
hanwencheng merged 3 commits intomainfrom
fix/issue-34-session-store-base-dir

Conversation

@hanwencheng
Copy link
Copy Markdown
Member

Summary

Option 2 follow-up to #42 for issue #34. Replaces the implicit \$HOME / AGENTKEYS_SESSION_STORE env-var reads at every session_store::* call with an explicit SessionStore handle that owns base_dir + KeyringMode. Tests construct one rooted at a per-test tempdir in file-only mode — no more unsafe { set_var }, no #[serial], no process-global races, no new dev-deps.

Why this on top of #42

#42 patched the symptom with #[ctor] + #[serial], which works but:

  • Still compiles unsafe { set_var } into the test binary (Rust 2024 flags this as unsound)
  • Requires every future HOME-touching test to remember #[serial] — silent flake if forgotten

This refactor removes the env-mutation pattern entirely. Tests can't reintroduce the race because the test helpers no longer expose HOME as a knob. It also unlocks a real product feature: callers can point the CLI at a non-default base directory without touching the process environment (future `AGENTKEYS_HOME` / `--data-dir` / container deploys).

Changes

`agentkeys-core/src/session_store.rs` — introduce `SessionStore` struct with `base_dir` + `KeyringMode`, with `new` / `file_only` / `from_env` constructors and method-form `save` / `load` / `clear` / `list_ids` / `session_path` / `load_with_legacy_fallback`. Existing free functions are preserved as thin wrappers that delegate to `SessionStore::from_env()`, so the daemon and any external callers keep working unchanged. Internal `marker_path` and file-I/O helpers are now methods on `SessionStore`. The module's own tests drop the `with_temp_home` mutex + `unsafe { set_var }` helper in favor of a per-test `SessionStore::file_only(tmp)`.

`agentkeys-cli/src/lib.rs` — add optional `session_store_override` field on `CommandContext` and a `.with_session_store(store)` builder. New `session_store()` accessor returns the override or `SessionStore::from_env()`. The five `session_store::*` call sites in `cmd_init`, `cmd_revoke` (x2), `cmd_recover`, and `CommandContext::load_session` now route through it.

`agentkeys-cli/tests/cli_tests.rs` — delete all 13 `unsafe { set_var }` calls and all `HOME` mutation. Every test owns a `(SessionStore, TempDir)` pair and threads the store through context builders.

Test plan

  • `cargo test --workspace` — all 151 tests pass (agentkeys-types 3, agentkeys-core 22, agentkeys-mock-server 56, agentkeys-cli 37, agentkeys-daemon 15, agentkeys-mcp 15, agentkeys-provisioner 3)
  • `cargo test -p agentkeys-cli` × 5 consecutive runs, parallel scheduler — 37/37 each time, 0.21s–0.68s
  • `cargo clippy -p agentkeys-core -p agentkeys-cli --tests` — no new lints (pre-existing warnings in unrelated files unchanged)
  • Workspace daemon unchanged — production call sites still resolve through the legacy free-function wrappers

Relationship to #42

Alternative approach — either can land. If #42 lands first, this PR will need a trivial rebase to delete the `ctor` / `serial_test` dev-deps and the `#[ctor::ctor]` block. If this lands first, #42 can be closed as superseded.

Closes #34.

Option 2 follow-up to #42.

Replaces the implicit `$HOME` / `AGENTKEYS_SESSION_STORE` env-var reads
at every call with an explicit `SessionStore` handle that owns both. Tests
construct one rooted at a per-test tempdir in file-only mode — no more
`unsafe { set_var }`, no `#[serial]`, no process-global races.

Changes
- agentkeys-core/src/session_store.rs: introduce `SessionStore` struct
  with `base_dir` + `KeyringMode`, expose `new`, `file_only`, `from_env`
  constructors and method-form save/load/clear/list/session_path.
  Existing free functions (`save_session`, `load_session`, `clear_session`,
  `fallback_path`, `list_fallback_session_ids`,
  `load_session_with_legacy_fallback`) are preserved as thin wrappers that
  delegate to `SessionStore::from_env()`, so the daemon and any external
  callers keep working unchanged. Internal `marker_path` and file-I/O
  helpers are now methods on `SessionStore`.
- agentkeys-core session_store tests: drop the `with_temp_home` mutex +
  `unsafe { set_var }` helper and rewrite to build a per-test
  `SessionStore::file_only(tmp)` directly. Fully parallel, fully hermetic.
- agentkeys-cli CommandContext: add optional `session_store_override`
  field and `.with_session_store(store)` builder. New `session_store()`
  accessor returns the override or `SessionStore::from_env()`. The five
  `session_store::*` call sites in `cmd_init`, `cmd_revoke` (x2),
  `cmd_recover`, and `CommandContext::load_session` now route through it.
- agentkeys-cli integration tests: delete all 13 `unsafe { set_var }`
  calls and all HOME mutation. Every test owns a `(SessionStore, TempDir)`
  pair and threads the store through context builders. Drops the flaky
  race surface (issue #34) by construction rather than by serialization.

Why this on top of #42
- #42 patched the symptom with `#[ctor]` + `#[serial]`, which works but
  still compiles `unsafe { set_var }` into the test binary and requires
  future HOME-touching tests to remember `#[serial]`. This refactor
  removes the env-mutation pattern entirely — tests can't reintroduce
  the race because the test helpers no longer expose HOME as a knob.
- Also unlocks a genuine product feature: callers can point the CLI at
  a non-default base directory without touching the process environment
  (future `AGENTKEYS_HOME` / `--data-dir` / container deploys).

Test plan
- `cargo test --workspace` — all 151 tests pass
- `cargo test -p agentkeys-cli` × 5 consecutive runs, parallel scheduler —
  37/37 each time, ~0.2–0.7s
- `cargo clippy -p agentkeys-core -p agentkeys-cli --tests` — no new
  lints (pre-existing warnings in unrelated files unchanged)

Closes #34.
…n-by-construction (codex PR #43 [P2])

Codex /codex review on PR #43 flagged that `SessionStore::new(base_dir, KeyringMode::Auto)` was unsafe: keyring entries are keyed on `session_id` alone and ignore `base_dir`, so two stores at different roots sharing a `session_id` would silently alias through the OS keychain. The newly introduced API promised per-root isolation but the `Auto` path violated that promise.

Fix: remove the public `SessionStore::new(base_dir, mode)` constructor. The only public constructors are now:
- `SessionStore::file_only(base_dir)` — explicit custom root, always file-only, safe by construction
- `SessionStore::from_env()` — home-rooted (the single-root invariant the keyring namespace assumes), may select `KeyringMode::Auto`

Custom-root callers physically cannot opt into keyring aliasing. No callers (internal or external) used `::new` with `KeyringMode::Auto` and a custom `base_dir`, so no migration is needed.

Tests: `cargo test -p agentkeys-core -p agentkeys-cli` → 22 core + 37 cli, 0 failures, 0.78s.

Refs #34, addresses codex [P2] on PR #43.
@hanwencheng
Copy link
Copy Markdown
Member Author

@claude review the PR

@claude
Copy link
Copy Markdown

claude Bot commented Apr 16, 2026

Claude finished @hanwencheng's task in 2m 10s —— View job


PR Review: refactor(session_store): thread base_dir + KeyringMode through API (#34)

Verdict: Approve with suggestions — This is a well-motivated, clean refactor. The core design is sound, and the test migration is the right call over the ctor+serial_test approach in #42. A few issues worth addressing before merge.


  • Read changed files
  • Analyze session_store.rs changes
  • Analyze cli/lib.rs changes
  • Analyze cli_tests.rs changes
  • Post review feedback

Bugs / Correctness

1. Redundant self.load() at the end of load_with_legacy_fallback

crates/agentkeys-core/src/session_store.rs:242

pub fn load_with_legacy_fallback(&self, session_id: &str) -> Result<Session> {
    if let Ok(s) = self.load(session_id) {
        return Ok(s);  // Fast path — already failed here
    }
    if session_id == "master" {
        // ...legacy paths...
    }
    self.load(session_id)  // ← Will always fail; same as the initial load
}

After the initial self.load fails and all legacy fallbacks are exhausted, calling self.load again is redundant — it will produce the same error as the first call, just with a less useful message (no mention of the legacy paths that were tried). Replace with a direct Err:

Err(anyhow::anyhow!("no session found for id={session_id} (and no legacy fallback matched)"))

Fix this →


2. list_ids returns sanitized names — round-trip list_ids → load silently double-sanitizes for unusual session IDs

crates/agentkeys-core/src/session_store.rs:194-215

Directory names on disk are the sanitized forms. list_ids returns them verbatim, but load(returned_id) re-applies sanitize_for_keyring — meaning any session ID that was rewritten (e.g. foo/bar__agk_foo_bar-cc5d46bd) would be double-sanitized and the load would fail. In practice this is harmless because all real session IDs (master, daemon-0xABC, etc.) are valid under the sanitizer. But this is a latent correctness hazard. At minimum add a comment:

// NOTE: returned names are the on-disk (sanitized) forms. For all production
// session IDs (alphanumeric + dash) sanitize_for_keyring is a no-op, so these
// can be passed directly to load(). If you ever call list_ids on a store that
// might hold rewritten IDs, be aware of this.

Design Concerns

3. session_store() re-reads env vars on every call

crates/agentkeys-cli/src/lib.rs:110-114

pub fn session_store(&self) -> SessionStore {
    self.session_store_override
        .clone()
        .unwrap_or_else(SessionStore::from_env)  // reads $HOME on every call
}

cmd_revoke calls ctx.session_store() twice in the same function body (once for each branch). from_env re-reads $HOME and AGENTKEYS_SESSION_STORE each time and allocates a new PathBuf. This is cheap in wall-clock terms, but it means the effective store can differ between calls within a single command if the env is mutated concurrently (unlikely, but semantically surprising).

The simpler fix is to resolve from_env once at construction time and store it in session_store_override unconditionally:

pub fn new(backend_url: &str, verbose: bool, json_output: bool) -> Self {
    Self {
        // ...
        session_store_override: Some(SessionStore::from_env()),  // eager, once
    }
}

Then with_session_store just overrides it, and session_store() always returns a &SessionStore reference (or clone of the stored value). This removes the conceptual split between "override" and "default" entirely.


4. session_store_override is pub while a builder pattern exists

crates/agentkeys-cli/src/lib.rs:54

pub session_store_override: Option<SessionStore>,

External code can bypass the builder and directly mutate or inspect this field. Since the other override fields (session_override, backend_override) are also pub, this is consistent with the existing pattern — but it's worth considering pub(crate) for all three to make the builder the sole external API.


Style / Minor

5. _session naming in start_scope_test_server — used variable should not have leading underscore

crates/agentkeys-cli/tests/cli_tests.rs:799

let (_output, _session) = cmd_init(...).await.unwrap();
// ...
.header("authorization", format!("Bearer {}", _session.token))

_session is actively used on the very next line. Leading underscores conventionally signal "intentionally unused." Rename to session to match intent.

Fix this →


6. Legacy free functions have no deprecation marker

crates/agentkeys-core/src/session_store.rs:326-359

The PR preserves fallback_path, save_session, load_session, etc. as thin wrappers and the docs say "new code should prefer building a SessionStore explicitly." A #[deprecated] annotation would enforce this at the compiler level and prevent new callers from accidentally using the old API:

#[deprecated(since = "0.2.0", note = "Use SessionStore::from_env().session_path() instead")]
pub fn fallback_path(session_id: &str) -> PathBuf {
    SessionStore::from_env().session_path(session_id)
}

What's Done Well

  • The isolation-by-construction invariant (file_only ↔ custom root, Auto ↔ home root only) is clearly documented and enforced structurally. The comment at lines 62–71 is the right level of explanation for a subtle correctness property.
  • Test hermetic isolation is now correct: each test owns its TempDir and SessionStore, no shared global state, no #[serial] required.
  • Path traversal defense (sanitize_for_keyring + session_dir routing through it) is sound and well-tested.
  • Legacy wrappers preserving the existing daemon/CLI interface is the right call for a refactor of this scope.
  • The clear synchrony fix (keyring delete blocks before returning) and the list_ids sort are important correctness properties that survive the refactor intact.

…k fallthrough (claude PR #43 review #1)

The trailing `self.load(session_id)` at the end of `load_with_legacy_fallback` was redundant — the same call already failed at the top of the function, so re-running it produced the same error with no added context about which legacy fallbacks were tried.

Replace with `anyhow::bail!` that names the attempted session path, the keyring mode, and the fact that legacy fallbacks only trigger for `id="master"`. Debuggability win; same observable behavior (still returns Err).

Tests: `cargo test -p agentkeys-core -p agentkeys-cli` → 22 + 37, 0 failures.
@hanwencheng hanwencheng merged commit 5f08790 into main Apr 16, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tests: process-global HOME mutation races in parallel tokio::test (cli_tests.rs)

1 participant